34 research outputs found

    Extending R2RML-F to support dynamic datatype and language tags

    Get PDF
    Linked data is often generated from raw data with the help of mapping languages. Complex data transformation is one of the essential parts while uplifting data which either can be implemented as custom solutions or separated from the mapping process. In this paper, we propose an approach of separating complex data transformations from the mapping process that can still be reusable across the systems. In the proposed method, complex data transformations include the entailment of (i) language tag and (ii) datatype present at the data source. The proposed method also includes inferring missing datatype information. We extended R2RML-F to handle data transformations. The results showed that transformation functions could be used to create typed literals dynamically. Our approach is validated on the test cases specified by the RDF mapping language (RML). The proposed method considers data in the form of JSON, thus making the system interoperable and reusable

    Linked Data Quality Assessment: A Survey

    Get PDF
    Data is of high quality if it is fit for its intended use in operations, decision-making, and planning. There is a colossal amount of linked data available on the web. However, it is difficult to understand how well the linked data fits into the modeling tasks due to the defects present in the data. Faults emerged in the linked data, spreading far and wide, affecting all the services designed for it. Addressing linked data quality deficiencies requires identifying quality problems, quality assessment, and the refinement of data to improve its quality. This study aims to identify existing end-to-end frameworks for quality assessment and improvement of data quality. One important finding is that most of the work deals with only one aspect rather than a combined approach. Another finding is that most of the framework aims at solving problems related to DBpedia. Therefore, a standard scalable system is required that integrates the identification of quality issues, the evaluation, and the improvement of the linked data quality. This survey contributes to understanding the state of the art of data quality evaluation and data quality improvement. A solution based on ontology is also proposed to build an end-to-end system that analyzes quality violations\u27 root causes

    (Linked) Data Quality Assessment: An Ontological Approach

    Get PDF
    The effective functioning of data-intensive applications usually requires that the dataset should be of high quality. The quality depends on the task they will be used for. However, it is possible to identify task-independent data quality dimensions which are solely related to data themselves and can be extracted with the help of rule mining/pattern mining. In order to assess and improve data quality, we propose an ontological approach to report data quality violated triples. Our goal is to provide data stakeholders with a set of methods and techniques to guide them in assessing and improving data qualit

    Positive impacts of important bird and biodiversity areas on wintering waterbirds under changing temperatures throughout Europe and North Africa

    Get PDF
    Migratory waterbirds require an effectively conserved cohesive network of wetland areas throughout their range and life-cycle. Under rapid climate change, protected area (PA) networks need to be able to accommodate climate-driven range shifts in wildlife if they are to continue to be effective in the future. Thus, we investigated geographical variation in the relationship between local temperature anomaly and the abundance of 61 waterbird species during the wintering season across Europe and North Africa during 1990-2015. We also compared the spatio-temporal effects on abundance of sites designated as PAs, Important Bird and Biodiversity Areas (IBAs), both, or neither designation (Unlisted). Waterbird abundance was positively correlated with temperature anomaly, with this pattern being strongest towards north and east Europe. Waterbird abundance was higher inside IBAs, whether they were legally protected or not. Trends in waterbird abundance were also consistently more positive inside both protected and unprotected IBAs across the whole study region, and were positive in Unlisted wetlands in southwestern Europe and North Africa. These results suggest that IBAs are important sites for wintering waterbirds, but also that populations are shifting to unprotected wetlands (some of which are IBAs). Such IBAs may therefore represent robust candidate sites to expand the network of legally protected wetlands under climate change in north-eastern Europe. These results underscore the need for monitoring to understand how the effectiveness of site networks is changing under climate change.Peer reviewe

    ATLAS Run 1 searches for direct pair production of third-generation squarks at the Large Hadron Collider

    Get PDF

    Measurement of the charge asymmetry in top-quark pair production in the lepton-plus-jets final state in pp collision data at s=8TeV\sqrt{s}=8\,\mathrm TeV{} with the ATLAS detector

    Get PDF

    An Ontological Approach for Recommending a Feature Selection Algorithm

    No full text
    Feature selection plays an important role in machine learning or data mining problems. Removing irrelevant features increases model accuracy and reduces the computational cost. However, selecting important features is not a simple task as one feature selection algorithm does not perform well on all the datasets that are of interest. This paper tries to address the recommendation of a feature selection algorithm based on dataset characteristics and quality. The research uses three types of dataset characteristics along with data quality metrics. The main contribution of the work is the utilization of Semantic Web techniques to develop a novel system that can aid in robust feature selection algorithm recommendations. The system’s strength lies in assisting users of machine learning algorithms by providing more relevant feature selection algorithms for the dataset using an ontology called Feature Selection algorithm recommendation based on Data Characteristics and Quality (FSDCQ). Results are generated using six different feature selection algorithms and four types of classifiers on ten datasets from UCI repository. Recommendations take the form of “Feature selection algorithm X is recommended for dataset i, as it performed better on dataset j, similar to dataset i in terms of class overlap 0.3, label noise 0.2, completeness 0.9, conciseness 0.8 units . While the domain-specific ontology FSDCQ was created to aid in the task of algorithm recommendation for feature selection, it is easily applicable to other meta-learning scenarios
    corecore